# Multilingual speech translation
Ultravox V0 6 Qwen 3 32b
MIT
Ultravox is a large multimodal speech language model capable of understanding and processing speech input, supporting multiple languages and noisy environments.
Audio-to-Text
Transformers Supports Multiple Languages

U
fixie-ai
1,240
0
Phi 4 Multimodal Instruct
MIT
Phi-4-multimodal-instruct is a lightweight open-source multimodal foundation model that integrates language, vision, and speech research and datasets from Phi-3.5 and 4.0 models. It supports text, image, and audio inputs to generate text outputs, with a context length of 128K tokens.
Multimodal Fusion
Transformers Supports Multiple Languages

P
Robeeeeeeeeeee
21
1
Seamless M4t V2 Large
SeamlessM4T is a large-scale multilingual multimodal machine translation model supporting speech and text translation in nearly 100 languages.
Text-to-Audio Supports Multiple Languages
S
audo
39
17
Seamless M4t V2 Large
SeamlessM4T v2 is a large-scale multilingual multimodal machine translation model released by Facebook, supporting speech and text translation for nearly 100 languages.
Text-to-Audio
Transformers Supports Multiple Languages

S
facebook
64.59k
821
Wav2vec2 Xls R 2b 21 To En
Apache-2.0
Facebook's Wav2Vec2 XLS-R model for multilingual speech-to-English translation tasks.
Speech Recognition
Transformers Supports Multiple Languages

W
facebook
38
5
S2t Medium Mustc Multilingual St
MIT
Transformer-based end-to-end multilingual speech translation model supporting English-to-multiple language speech translation
Speech Recognition
Transformers Supports Multiple Languages

S
facebook
7,322
6
Wav2vec2 Xls R 1b 21 To En
Apache-2.0
Facebook's Wav2Vec2 XLS-R model for multilingual speech-to-English translation tasks
Speech Recognition
Transformers Supports Multiple Languages

W
facebook
511
3
Wav2vec2 Xls R 300m 21 To En
Apache-2.0
Facebook's Wav2Vec2 XLS-R fine-tuned for speech translation from 21 languages to English
Speech Recognition
Transformers Supports Multiple Languages

W
facebook
464
5
Wav2vec2 Xls R 1b En To 15
Apache-2.0
Facebook's Wav2Vec2 XLS-R model fine-tuned for speech translation tasks, supporting translation from English to 15 target languages.
Speech Recognition
Transformers Supports Multiple Languages

W
facebook
505
3
Wav2vec2 Xls R 2b En To 15
Apache-2.0
Facebook's Wav2Vec2 XLS-R model, fine-tuned for speech translation tasks in 15 languages, capable of translating spoken English into multiple written languages.
Speech Recognition
Transformers Supports Multiple Languages

W
facebook
27
1
Wav2vec2 Xls R 300m En To 15
Apache-2.0
Facebook's Wav2Vec2 XLS-R model fine-tuned for multilingual speech translation tasks, supporting translation from English to 15 target languages.
Speech Recognition
Transformers Supports Multiple Languages

W
facebook
167
6
Featured Recommended AI Models